Search CORE

44 research outputs found

Halvade: scalable sequence analysis with MapReduce

Author: Costanza Pascal
Decap Dries
Fostier Jan
Herzeel Charlotte
Reumers Joke
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50x coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR

Crossref

Ghent University Academic Bibliography

PubMed Central

Archivsystem Ask23

elPrep 4 : a multithreaded framework for sequence analysis

Author: Costanza Pascal
Decap Dries
Fostier Jan
Herzeel Charlotte
Verachtert Wilfried
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources

Ghent University Academic Bibliography

Directory of Open Access Journals

FigShare

Multithreaded variant calling in elPrep 5

Author: Costanza Pascal
Decap Dries
Fostier Jan
Herzeel Charlotte
Verachtert Wilfried
Wuyts Roel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2021
Field of study

We present elPrep 5, which updates the elPrep framework for processing sequencing alignment/map files with variant calling. elPrep 5 can now execute the full pipeline described by the GATK Best Practices for variant calling, which consists of PCR and optical duplicate marking, sorting by coordinate order, base quality score recalibration, and variant calling using the haplotype caller algorithm. elPrep 5 produces identical BAM and VCF output as GATK4 while significantly reducing the runtime by parallelizing and merging the execution of the pipeline steps. Our benchmarks show that elPrep 5 speeds up the runtime of the variant calling pipeline by a factor 8-16x on both whole-exome and whole-genome data while using the same hardware resources as GATK4. This makes elPrep 5 a suitable drop-in replacement for GATK4 when faster execution times are needed

Ghent University Academic Bibliography

Directory of Open Access Journals

Context-oriented software transactional memory in common lisp

Author: Cao Minh C.
Charlotte Herzeel
Herzeel C.
Kulkarni M.
Larus J. R.
Pascal Costanza
Theo D'Hondt
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

elPrep: high-performance preparation of sequence alignment/map files for variant calling

Author: C Raczy
Charlotte Herzeel
Christophe Antoniewski
D Decap
Dries Decap
G Cochrane
G Faust
G Tischler
G Van der Auwera
H Li
H Li
Jan Fostier
Joke Reumers
M DePristo
M Fritz
Pascal Costanza
R Blumofe
R Guimera
R Luo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/12/2014
Field of study

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1: 40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

Scipedia

FigShare

An Extensible Interpreter Framework for Software Transactional Memory

Author: Charlotte Herzeel
Pascal Costanza
Publication venue
Publication date
Field of study

Abstract: Software transactional memory (STM) is a new approach for coordinating concurrent threads, for which many different implementation strategies are currently being researched. In this paper we show that if a language implementation provides reflective access to explicit memory locations, it becomes straightforward to both (a) build an STM framework for this language and (b) to implement STM algorithms using this framework. A proof-of-concept implementation in the form of a Scheme interpreter (written in Common Lisp) is presented

CiteSeerX